Historical document image segmentation using background light intensity normalization
Identifieur interne : 001395 ( Main/Exploration ); précédent : 001394; suivant : 001396Historical document image segmentation using background light intensity normalization
Auteurs : ZHIXIN SHI [États-Unis] ; Venugopal Govindaraju [États-Unis]Source :
- SPIE proceedings series [ 1017-2653 ] ; 2005.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
This paper presents a new document binarization algorithm for camera images of historical documents, which are especially found in The Library of Congress of the Unite States. The algorithm uses a background light intensity normalization algorithm to enhance an image before a local adaptive binarization algorithm is applied. The image normalization algorithm uses an adaptive linear or non-linear function to approximate the uneven background of the image due to the uneven surface of the document paper, aged color or uneven light source of the cameras for image lifting. Our algorithm adaptively captures the background of a document image with a "best fit" approximation. The document image is then normalized with respect to the approximation before a thresholding algorithm is applied. The technique works for both gray scale and color historical handwritten document images with significant improvement in readability for both human and OCR.
Affiliations:
- États-Unis
- État de New York
- Buffalo (New York)
- Université d'État de New York, Université d'État de New York à Buffalo
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000458
- to stream PascalFrancis, to step Curation: 000330
- to stream PascalFrancis, to step Checkpoint: 000404
- to stream Main, to step Merge: 001434
- to stream Main, to step Curation: 001395
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Historical document image segmentation using background light intensity normalization</title>
<author><name sortKey="Zhixin Shi" sort="Zhixin Shi" uniqKey="Zhixin Shi" last="Zhixin Shi">ZHIXIN SHI</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo</s1>
<s2>Amherst</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Amherst</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo</s1>
<s2>Amherst</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Amherst</wicri:noRegion>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">05-0360928</idno>
<date when="2005">2005</date>
<idno type="stanalyst">PASCAL 05-0360928 INIST</idno>
<idno type="RBID">Pascal:05-0360928</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000458</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000330</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000404</idno>
<idno type="wicri:doubleKey">1017-2653:2005:Zhixin Shi:historical:document:image</idno>
<idno type="wicri:Area/Main/Merge">001434</idno>
<idno type="wicri:Area/Main/Curation">001395</idno>
<idno type="wicri:Area/Main/Exploration">001395</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Historical document image segmentation using background light intensity normalization</title>
<author><name sortKey="Zhixin Shi" sort="Zhixin Shi" uniqKey="Zhixin Shi" last="Zhixin Shi">ZHIXIN SHI</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo</s1>
<s2>Amherst</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Amherst</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo</s1>
<s2>Amherst</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Amherst</wicri:noRegion>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint><date when="2005">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Adaptive algorithm</term>
<term>Best approximation</term>
<term>Document image processing</term>
<term>Gray scale</term>
<term>Grey level image</term>
<term>Image processing</term>
<term>Image segmentation</term>
<term>Manuscript character</term>
<term>Non linear function</term>
<term>Optical character recognition</term>
<term>Threshold detection</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Traitement image document</term>
<term>Segmentation image</term>
<term>Algorithme adaptatif</term>
<term>Fonction non linéaire</term>
<term>Meilleure approximation</term>
<term>Détection seuil</term>
<term>Echelle gris</term>
<term>Image niveau gris</term>
<term>Caractère manuscrit</term>
<term>Reconnaissance optique caractère</term>
<term>Traitement image</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper presents a new document binarization algorithm for camera images of historical documents, which are especially found in The Library of Congress of the Unite States. The algorithm uses a background light intensity normalization algorithm to enhance an image before a local adaptive binarization algorithm is applied. The image normalization algorithm uses an adaptive linear or non-linear function to approximate the uneven background of the image due to the uneven surface of the document paper, aged color or uneven light source of the cameras for image lifting. Our algorithm adaptively captures the background of a document image with a "best fit" approximation. The document image is then normalized with respect to the approximation before a thresholding algorithm is applied. The technique works for both gray scale and color historical handwritten document images with significant improvement in readability for both human and OCR.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>État de New York</li>
</region>
<settlement><li>Buffalo (New York)</li>
</settlement>
<orgName><li>Université d'État de New York</li>
<li>Université d'État de New York à Buffalo</li>
</orgName>
</list>
<tree><country name="États-Unis"><noRegion><name sortKey="Zhixin Shi" sort="Zhixin Shi" uniqKey="Zhixin Shi" last="Zhixin Shi">ZHIXIN SHI</name>
</noRegion>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001395 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001395 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:05-0360928 |texte= Historical document image segmentation using background light intensity normalization }}
This area was generated with Dilib version V0.6.32. |